
## ProteinConformers: Benchmark Dataset for Simulating Protein Conformational Landscape Diversity and Plausibility

### 1. Install Environments
#### Install bioemu environment
##### Stage-1 Create bioemu environment
* pip install bioemu

##### Stage-2 Create colabfold-env environment
* conda create -n colabfold_env python=3.10
* conda activate colabfold_env
* pip install uv
* export VENV_FOLDER=/mnt/rna01/chenw/anaconda3/envs/colabfold_env
* uv pip install --python ${VENV_FOLDER}/bin/python 'colabfold[alphafold-minus-jax]==1.5.4' 
* uv pip install --python ${VENV_FOLDER}/bin/python --force-reinstall "jax[cuda12]"==0.4.35 "numpy==1.26.4"
* export SITE_PACKAGES_DIR=${VENV_FOLDER}/lib/python3.10/site-packages
* patch ${SITE_PACKAGES_DIR}/alphafold/model/modules.py ${SCRIPT_DIR}/modules.patch 
* patch ${SITE_PACKAGES_DIR}/colabfold/batch.py ${SCRIPT_DIR}/batch.patch
* touch ${VENV_FOLDER}/.COLABFOLD_PATCHED
* The BIOEMU_COLABFOLD_DIR is `/mnt/rna01/chenw/anaconda3/envs/colabfold_env`
*  vi /mnt/rna01/chenw/WorkSpace_Bio/bioemu/src/bioemu/get_embeds.py, change the line of code `return subprocess.run(cmd, env=colabfold_env, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)` to `return subprocess.run(['conda', "run", "-n", "colabfold_env", *cmd], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)`
* pip install esm==3.0.4
* pip install -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple tokenizers
* pip install -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple transformers

#### Install esmdiff environment
Please reference the Readme file in the project.

### 2. Running the Benchmark Code
#### 2.1. Sample conformations using bioemu
```bash
1. source activate bioemu
2. export BIOEMU_COLABFOLD_DIR=/mnt/rna01/chenw/anaconda3/envs/colabfold_env
3. export CUDA_HOME=/mnt/apps/cuda_12.1.0
4. python tools_generate_conformations.py --fasta_file_path /mnt/rna01/chenw/WorkSpace_Bio/ProteinDynamicBenchmark/configs/casp_data_new_bioemu/casp14_new.fasta --sampler_type bioemu --sample_size 3000  --save_path /mnt/rna01/chenw/WorkSpace_Bio/ProteinDynamicBenchmark/output3/bioemu     
```

#### 2.2. Sample conformations via using esmdiff
```bash
python tools_generate_conformations.py --fasta_file_path /mnt/rna01/chenw/WorkSpace_Bio/ProteinDynamicBenchmark/configs/data/casp14.fa --sampler_type esmdiff --sample_size 5000 --save_path /mnt/rna01/chenw/WorkSpace_Bio/ProteinDynamicBenchmark/output/esmdiff --ckpt_path /mnt/rna01/chenw/WorkSpace_Bio/esmdiff/data/ckpt/release_v0.pt --sample_mode ddpm --sample_steps 1000 --model_config_path /mnt/rna01/chenw/WorkSpace_Bio/ProteinDynamicBenchmark/configs/esmdiff/experiment/mdlm.yaml
```

### 3. Evaluation
#### 3.1 Calculate the free energy landscape
python tools_calculate_free_energy_landscape.py

#### 3.2 Calculate the pcps and pcpm metrics
##### 3.2.1 Calculate the pcps metric
python tools_pcps.py 

##### 3.2.2 Calculate the pcpm metric
###### Calculate the pcpm metric for each protein
python tools_pcpm_individual.py     
###### Calculate the pcpm metric for all the proteins
python tools_pcpm_distribution.py



## Citation
If you are using our code or model, please consider citing our work:
```bibtex
@article {BioEmu2024,
    author = {Yihang Zhou, Chen Wei, Minghao Sun, Jin Song, Yang Li, and Yang Zhang},
    title = {ProteinConformers: Benchmark Dataset for Simulating Protein Conformational Landscape Diversity and Plausibility},
    year = {2025},
    doi = {},
    journal = {}
}
```